Routing on the Channel Dependency Graph:: A New Approach to Deadlock-Free, Destination-Based, High-Performance Routing for Lossless Interconnection Networks

نویسنده

  • Jens Domke
چکیده

In the pursuit for ever-increasing compute power, and with Moore’s law slowly coming to an end, highperformance computing started to scale-out to larger systems. Alongside the increasing system size, the interconnection network is growing to accommodate and connect tens of thousands of compute nodes. These networks have a large influence on total cost, application performance, energy consumption, and overall system efficiency of the supercomputer. Unfortunately, state-of-the-art routing algorithms, which define the packet paths through the network, do not utilize this important resource efficiently. Topologyaware routing algorithms become increasingly inapplicable, due to irregular topologies, which either are irregular by design, or most often a result of hardware failures. Exchanging faulty network components potentially requires whole system downtime further increasing the cost of the failure. This management approach becomes more and more impractical due to the scale of today’s networks and the accompanying steady decrease of the mean time between failures. Alternative methods of operating and maintaining these high-performance interconnects, both in terms of hardwareand software-management, are necessary to mitigate negative effects experienced by scientific applications executed on the supercomputer. However, existing topology-agnostic routing algorithms either suffer from poor load balancing or are not bounded in the number of virtual channels needed to resolve deadlocks in the routing tables. Using the fail-in-place strategy, a well-established method for storage systems to repair only critical component failures, is a feasible solution for current and future HPC interconnects as well as other large-scale installations such as data center networks. Although, an appropriate combination of topology and routing algorithm is required to minimize the throughput degradation for the entire system. This thesis contributes a network simulation toolchain to facilitate the process of finding a suitable combination, either during system design or while it is in operation. On top of this foundation, a key contribution is a novel scheduling-aware routing, which reduces fault-induced throughput degradation while improving overall network utilization. The scheduling-aware routing performs frequent property preserving routing updates to optimize the path balancing for simultaneously running batch jobs. The increased deployment of lossless interconnection networks, in conjunction with fail-in-place modes of operation and topology-agnostic, scheduling-aware routing algorithms, necessitates new solutions to solve the routing-deadlock problem. Therefore, this thesis further advances the state-of-the-art by introducing a novel concept of routing on the channel dependency graph, which allows the design of an universally applicable destination-based routing capable of optimizing the path balancing without exceeding a given number of virtual channels, which are a common hardware limitation. This disruptive innovation enables implicit deadlock-avoidance during path calculation, instead of solving both problems separately as all previous solutions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Necessary and Sufficient Conditions for Deadlock-free Networks

In this paper we develop a new and generic theory about the necessary and sufficient conditions for deadlock-free routing in the interconnection networks An extension of the channel dependency graph described by Dally is defined, the channel dynamic dependency graph. The main achievement of this new concept is consecuence of introducing the concept of time and the flow control function in its d...

متن کامل

Multiprocessor Interconnection Networks

A deadlock-free routing algorithm can be generated for arbitrary interconnection networks using the concept of virtual channels. A necessary and sufficient condition for deadlock-free routing is the absence of cycles in a channel dependency graph. Given an arbitrary network and a routing function, the cycles of the channel dependency graph can be removed by splitting physical channels into grou...

متن کامل

Turn Prohibition Based Routing in Irregular Computer Networks

In this paper we study turn prohibition based multicast wormhole message routing in computer networks with irregular interconnection graphs. Routing is accomplished in two phases. When a topology change is detected the first phase of routing is executed. Here the network graph is analyzed and a set of turns to be prohibited to break all cycles in the channel dependency graph is created. The tur...

متن کامل

Tree-Based Multicasting on Wormhole-Routed Star Graph Interconnection Networks with Hamiltonian Path

Multicast is an important collective communication operation on multicomputer systems, in which the same message is delivered from a source node to an arbitrary number of destination nodes. The star graph interconnection network has been recognized as an attractive alternative to the popular hypercube network. In our previous work on wormhole star graph networks routing, we proposed a path-base...

متن کامل

Performance Analysis of a New Neural Network for Routing in Mesh Interconnection Networks

Routing is one of the basic parts of a message passing multiprocessor system. The routing procedure has a great impact on the efficiency of a system. Neural algorithms that are currently in use for computer networks require a large number of neurons. If a specific topology of a multiprocessor network is considered, the number of neurons can be reduced. In this paper a new recurrent neural ne...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017